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SIX-TERM KARATSUBA- VARIANT CALCULATOR 



TECHNICAL FIELD 

This invention generally relates to technology involving large-scale 
computations. 

BACKGROUND 

Multiplying two polynomials (or integers) efficiently is a key issue in a 
variety of academic fields and practical applications. Examples of such fields and 
applications include (but are not limited to): signal processing, cryptography, 
digital security systems, computer science, and number theory. 

Multiplication — Classic Schoolbook Style 

The conventional (i.e., "classic schoolbook" style) of multiplication in 
positional number systems requires approximately c*ra*« operations to multiply 
an m-place (e.g., m-digits) number by an H-place (e.g., ^-digits) number, for some 
constant c. When m = n, this conventional procedure for multiplication of two n- 
place numbers requires an execution time approximately proportional to « 2 , as n 
increases. This is sometimes written 0(n 2 ). 

Put another way, this classic schoolbook approach of multiplication of two 
n-digit numbers results in a cost of n 2 operations. The overall cost is the number 
of basic operations (e.g., addition, subtraction, and multiplication of single-digit 
numbers) required to complete a task; however, often the focus is on the 
multiplication tasks since they typically require significantly more processing 
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resources than additions and subtractions. The task in this case is the 
multiplication of two n-digit numbers. 

This conventional multiplication approach has many different names. For 
example, it may be called "brute-force", "long", "classic", and such. Herein, it is 
referred to as "classic schoolbook" multiplication. 

It is also used for the multiplication of polynomials. For example, let n be a 
positive integer. The "classic schoolbook" way to multiply two univariate 
polynomials of degree at most n-\ (i.e., with n terms each, some of whose 
coefficients may be zero) needs n 2 multiplications of coefficients. It multiplies 
each coefficient of one polynomial by each coefficient of the other, adding the 
products where needed. 

If a(X) = a x X+ a 0 and b{X) = b x X+ bo are two linear polynomials in the 
same variable X, then the "classic schoolbook" approach computes all three 
coefficients of the product polynomial a(X)b(X) = a x b x X 2 + (a\b 0 + a 0 b\) X+ a 0 b 0 
with four multiplications of the original coefficients, followed by one addition. As 
discussed later, other approaches need only three coefficient multiplications. 

Fast Multiplication - Karatsuba Style 

In 1962, A. Karatsuba and Yu. Ofman suggested (in Doklady Akad. Nauk 
SSSR 145 (1963), 293-294) a new multiplication technique that had an overall 
asymptotic cost less than the classic schoolbook's 0(« 2 ). Since 1962, many 
variants of Karatsuba have been proposed. This is described further by Donald E. 
Knuth in his "The Art of Computer Programming", Volume 2, Seminumerical 
Algorithms, Third Edition, Addison- Wesley 1998). More on the Karatsuba-style 
multiplication appears in "Generalizations of the Karatsuba Algorithm for 
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Efficient Implementations" by Andre Weimerskirch and Christof Paar 
(http://wwwxrypto.mhr-uni-bochum.de/Publikationen/texte 

In terms of efficiency, the Karatsuba-style multiplication approach (with its 
existing variants) is an improvement over the classic schoolbook approach. Just 
like back in 1962 when Karatsuba suggested a new approach, it is still desirable to 
improve the efficiency (and thus speed) of multiplication. 
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SUMMARY 



A technology generally related to large-scale computations is described 
herein. For example, one implementation, described herein, may be employed in 
the fields of cryptography and digital security systems. An implementation, 
described herein, employs a new and improved variant of the Karatsuba 
multiplication approach. 
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BRIEF DESCRIPTION OF THE DRAWINGS 



The same numbers are used throughout the drawings to reference like 
elements and features. 

Fig. 1 shows an example of a system that may employ a Karatsuba-variant 
calculator in accordance with an implementation described herein. 

Fig. 2 is a flow diagram showing a methodological implementation 
described herein. 

Fig. 3 is an example of a computing operating environment capable of 
(wholly or partially) implementing at least one embodiment described herein. 
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DETAILED DESCRIPTION 

The following description sets forth techniques for performing a six-term 
Karatsuba- Variant calculation. The techniques may be implemented in many 
ways, including on computing systems or computer networks, as part of digital 
security, anti-piracy, cryptography architectures, systems, and/or applications. 

An example of an embodiment, described herein, may be referred to as an 
"exemplary Karatsuba-variant calculator." 

Minimum Multiplications Function M(n) 

Given a positive integer n, let M(n) denote the minimum number of 
coefficient multiplications needed to multiply two polynomials of degree at most 
n-\. Sometimes, polynomials of degree at of degree at most n - 1 are described 
as polynomials having n terms each (where some coefficients may be zero). 

M{\) =1 may be described, in words, as multiplication of two constant 
polynomials, having one term each, that results in a minimum of one coefficient 
multiplication. This situation is trivial. Since degree-zero polynomials are scalars, 
one simply multiplies the two scalars. 

"Classic Schoolbook" is Not Optimal 

The "classic schoolbook" approach shows that M(ri) < n 2 for all n. In 
particular M{2) < 4. However we can achieve M(2) < 3. 

Consider the two linear polynomials a(X) = a } X+ a 0 and b(X) = b x X + b 0 in 
the same variable X. Their product is 
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a(X)b(X) = (a l X+a 0 )(b l X+b 0 ) = a x b x X 2 + {a x b^a Q b x )X^ a 0 b Q . 



Instead of computing all four products a x b x , a x bo, a 0 b x ,a 0 b 0 , one can start with a x b x 
anda 0 6o- Use the identity 

a x b 0 + a 0 b x = (a x + a 0 ) (b\+ b 0 ) ~ a x b x - a 0 b 0 . 

This identity replaces two multiplications (namely, a x b 0 and a 0 b x ) by the single 
multiplication (a x + a 0 )(b x + b 0 ) and some additions (herein, "addition" operations 
also encompass "subtraction" operations). 

We can summarize this computation with 

(axX+ao) (b x X+b 0 ) 
= a 0 bo(-X+\) + (a x +a 0 )(b x +bo)X+a x b x (X 2 -X) [1] 

Within Equation 1, the (--A r + 1) factor after a 0 b 0 signifies that a 0 b 0 appears 
with a minus sign in the coefficient of X 1 and with a plus sign in the coefficient of 
X 0 = 1. The product (a x + a 0 ) (b x + bo) is used only once, with a plus sign in the 
coefficient of X \ The product a x b x appears with a plus sign in the coefficient of 
X and with a minus sign in the coefficient of X . 

The three products used are a 0 b 0 , (a x + a 0 ) (b x + b 0 ), and a x b x . The first two 
products are a(0)b(0) and a(l)Z?(l), the values of the quadratic polynomial 
a(X)b(X) evaluated at X= 0 and X= 1. The last product, a x b x , may be interpreted 
as a(oo) Z?(oo) ? the most significant coefficient of the product. The product 
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polynomial (degree at most 2 for this example) is uniquely determined by its 
values at three distinct points. 

By replacing Xby -X, a x by -a x and b\ by -b\ , another formula is derived 
that uses only three multiplications: 

(fljX+ao) (&i^+&o) 
= a 0 b 0 ( x+ 1) + (*i " *o) (&i " b 0 )(-X) + a,^ (X 2 + A). [2] 

Higher Degree — M(ri\ < n(n+lV2 

Let jz be an arbitrary positive integer. The technique in the last section generalizes 
to show M(n) < n(/z+l)/2. Given two input polynomials 

a(X)= £ aj? 

0<iZn-\ 

and 

b(X)= X h x * ' 

0£/£n-l 

of degree at most yi-1, the product is 
a(X)b(X) = X X 

0<j</i-1 0£ j<,n-\ 

To elaborate further: Once all products of the form a t b t are evaluated, each 
a t bj + cij bj where i < j may be evaluated using one of the identities 
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or 



a, bj + ctj bi — 
di bj + aj bi = 



(a, + aj) (bi + bj) - a t bi - a } bj 
at bi + aj bj ~ (a, - aj) (bi - bj). 



This approach has n products of the form a t b { and of the form (a, + 

aj) (bi+ bj) or (a t - aj) (b t - bj), for a combined products. 

Example: n = 3 

When « = 3, this approach achieves 6 scalar multiplications (rather than the 
9 multiplications used by the classic schoolbook approach). If we start with a(X) 
= a 2 X 2 + a x X+ a 0 and b(X) = b 2 X 2 + b x X+ b 0 , then the product is 
a(X) b(X) = (a 2 X 2 + a x X+ a 0 )(b 2 X 2 +b x X+ b 0 ) 

= a 2 b 2 X 4 + (a 2 b x +a x b 2 )X 3 ^(a 2 b 0 + a x b x -^a 0 b 2 )X 2 
+ (a\bo + a 0 b\)X+ a 0 b 0 . 
To compute all five coefficients, start with a 2 b 2 , a x b x , anda 0 6 0 . Use the technique 
in the last section three times. 

a 2 b\ + a x b 2 = (a 2 + a x ) (b 2 + b\) — a 2 b 2 — a x b x 

a 2 b 0 + a 0 b 2 = ( a 2 + a o) (^2 + &o) ~ #2&2 - a 0 b 0 
a x b 0 + a 0 b x = (a x + a 0 ) (b x + b 0 ) ~ a x b x - a 0 b 0 . 
This construction shows M(3) < 6. 

(a 2 X 2 + a x X+a 0 ) (b 2 X 2 + b x X+ b 0 ) 
= a 0 b 0 (l -X-X 2 ) + a x b x (-X +X 2 -X 3 ) + a 2 b 2 (-X 2 -X 3 

+ (a x + fl 0 ) (b\ + b 0 )X+ (a 2 + a 0 ) (b 2 +b 0 )X 2 + (a 2 + a x ) (b 2 + b x )X 3 . 
If we instead use the identities 
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a 2 b\ + a x b 2 = a 2 b 2 + a x b x - (a 2 - a x ) (b 2 - b x ) 
a 2 b 0 + a 0 b 2 = a 2 b 2 + a 0 b 0 - (a 2 - a 0 ) (b 2 - b 0 ) 
a x b 0 + a 0 b x = a x b x + a 0 b 0 - (a x - a 0 ) (b x - b 0 \ 
then we end up with 

(a 2 X 2 + a x X+a 0 ) (b 2 X 2 + b x X+b 0 ) 
= a Q b 0 (l +X+X 2 ) + a x b x (X+X 2 +X 3 ) + a 2 b 2 (X 2 +X 3 +X*) 
" (ai ~ a 0 ) (b x - bo) X- (a 2 - a 0 ) (b 2 -b 0 )X 2 ~ (a 2 - a x ) (b 2 - b x )X\ 

Fast Multiplication via Recursion- Karatsuba Style 

So far our bound M(ri) < n{n+\)l2 remains 0(n 2 ), like the 0(n 2 ), time of the 
"classic schoolbook" method but with a smaller constant factor. Karatsuba and 
Ofman demonstrated that the asymptotic cost may be improved using recursion 
when n is composite. 

The bound M{ri) < n(n+\)ll gives M(4) < 10. The recursive algorithm will 
show M(4) < M(2) 2 < 9. Consider the two four-term (degree < 3) polynomials: 

a(X) = a 3 X 3 + a 2 X 2 + a x X+ a 0 and b(X) = b 3 X 3 + b 2 X 2 + b x X+ b 0 
Letting Y denote X 2 , these may be rewritten as: 
a(X, Y) = (a 3 X+ a 2 )Y + (a x X+ a 0 ) and b(X y Y) = (b 3 X+ b 2 )Y+ (b x X+ b 0 ) . 

Note that a(X) originally has 4 terms. Group the degree 2 and degree 3 
terms together and factor out an X 2 = 7. Now a(X) is expressed as a linear 
polynomial in Y with coefficients which are polynomials in X. Later, 7 may be 
replaced by X 2 . View the intermediate a(X , Y) and Z>(X , 7) as linear polynomials 
in the variable 7, with polynomial coefficients in the variable X. 
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After doing the three multiplications 

(a 3 X+a 2 ) (b^X + b 2 ) 
(a x X+a 0 ) (b x X + Z> 0 ) 

(a 3 X+ a 2 + a x X+ a 0 ) (b 3 X+ b 2 + b x X+ b 0 ) 

(products of linear polynomials in the variable X) and doing a few more additions 
(of quadratic polynomials in X), one obtains the polynomial product a(X 9 Y)b(X 9 Y). 
If Y is replaced by X 2 and a few more additions are done (by combining terms with 
the same power of X), then one obtains a(X)b(X). In doing so, one uses only nine 
multiplications in the base field (or ring), three for each product of two linear 
polynomials in X. Therefore M(4) < 9. 

Example 

For example, to form the product a(X) b{X) where 

a(X) = 3X 3 + X 2 + 4X+ 1 and b(X) = 5X 3 + 9X 2 + 2X+ 6, 

this approach uses the three products 

(3X+ 1) (5X+ 9) = \5X 2 + 32X+ 9 
(4X+ 1) (2X+ 6) = SX 2 + 26X+ 6 
{IX +2) (7X+ 15) = 49X 2 + 119^+30 
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In this example, the polynomial product (3X+ 1) (5X + 9) (for example) 
needs only the three coefficient multiplications: 3 x 5 = 15, (3 + 1) x (5 + 9) = 4 x 
14 = 56 and 1 x 9 = 9. The middle coefficient of that product is 56 - 15 - 9 = 32. 
Similar observations apply to the other two products of linear (in X) factors. 

At the next level, the middle coefficient is 

(49X 1 + U9X+ 30) - {\5X 2 + 32X+ 9) - (SX 2 + 26X+ 6) = 26X 2 + 61X+ 15 

and the product a{X) b(X) is (with Y = X 2 ): 

a{X) b{X) = (15X 2 + 32X+ 9)Y 2 + {26X 2 + 6LY+ 15)7+ (8Z 2 + 26X+ 6) 
= 15 X 6 + 32X 5 + (9 + 26)X 4 + 61X 3 + (15 + S)X 2 + 26X+ 6 
= 15 X 6 + 32X 5 + 35X 4 + 61X 3 + 23X 2 + 26X+ 6 . 

Basis for recursion — M{n\_ ai?) ^ M/* i ) M(n?) 

The last construction illustrates how one can multiply two polynomials 
each with n x n 2 terms using M{n x ) multiplications of polynomials of with n 2 terms, 
by breaking the input into n x blocks each of length n 2 . 

A corollary is M(n x n 2 ) < M(n x ) M(n 2 ) . 

In particular, when n = 2 k is a power of 2, this recursive technique 
multiplies two polynomials of degree at most n-\ with M(2) k < 3* « n 1,585 
multiplications rather than the brute-force n 2 = 4* multiplications of the classic 
schoolbook approach. 
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Lowering the asymptotic exponent from 2 to 1.585 makes an enormous 
difference when n is large. Even for the modest n = 2 5 = 32, this technique uses 
243 multiplications rather than 1024, a four- fold improvement. This is a simple 
example of high-speed multiplication. 

So far we have glossed over the number of additions needed. If n is a 
power of 2 and A(n) denotes the number of coefficient additions needed to 
multiply two polynomials of degree at most n-\ by this method, then A{\) = 0 and 
A(2n) = 3A(n) + %n - 4. The solution is A{2 k ) = 6*3* - 8*2* + 2, approximately six 
times the number of multiplications used. 

Handling Odd Degrees 

It is easy to show M(2n + 1) < M{n) + 2M(n +1). Each input polynomial 
with 2/2+1 terms is split into one piece with n terms and one piece with n + 1 
terms. 

Proceeding recursively then gives an overall cost of 0{n % 3 ) « 0(w 1,585 ) 
where lg denotes base-2 logarithm, even if n is not restricted to powers of 2. 

Multiplication by Interpolation 

Interpolation approaches, such as that described Section 3.7.3, p.79 of H. J. 
Nussbaumer ("Fast Fourier Transform and Convolution Algorithms", 2 nd Ed., 
Spirnger-Verlag, Berlin, 1982), give a formula for multiplying two quadratic 
polynomials with five multiplications (rather than the six multiplications required 
by the conventional approach) when one of the two input polynomials will be 
(re)used for several different products. Nussbaumer evaluates the degree-4 
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product at five values of X (namely at -1, 0, 1, 2, and oo) 5 interpolating to get the 
five output coefficients. 

However, Nussbaumer's formula requires a division by 6. Such divisions 
are not allowed in characteristics 2 and 3 algebras. The five points of evaluation 
are not distinct in characteristics 2 and 3. 



Known Upper bounds on M(n) for small n 

Using the results so far, the following table represents the bounds: 



n 


2 


3 


4 


5 


6 


7 


8 


M(n)< 


3 


6 


9 


15 


18 


24 


27 


Reason 


* 


* 




5(5+1 )/2 or 
M(2)+2M(3) 


M(2)/W(3) 


M(3)+2M{4) 


M(2)M(4) 



TABLE 1 

*= Reason discussed above 
These bounds agree with those in Appendix A of Weimerskirch and Paar. 



Cryptographic Applications of Polynomial Multiplication 

Cryptographic applications of polynomial multiplication include large integer 
multiplication and finite field arithmetic. We describe those briefly below. 

Application to Large Integer Multiplication 

Some cryptographic algorithms such as RSA require multiplication of 
large integers. The inputs may be 1024-bit integers or larger. Typically, these 
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numbers are represented in radix 2 or 2 within a computer, requiring 32 
(=1024/32) or 16 (=1024/64) words to represent each 1024-bit number. 

For example, fix a base R and a length n. To multiply two large integers A 
and B between 0 and R n - 1, start with their radix-/? representations 

A = Z fl /*' and 5== X b j RJ 

where 0 < a, , < R and 0< bj < R. 

Two polynomials are introduced here: 

a(X)= £ a.*' and 6(X)= £ 

These polynomials are selected so that A = a(R) and B = b(R). Compute the 
polynomial product a(X) b(X) and substitute X = R. With those changes, these 
polynomials may be rewritten: 

A = a(R) = £ aft and B = b(R)= £ 

Those who are skilled in the art are familiar with additional details 
necessary to perform this sort of large integer multiplications at this point. 
Examples of such details are found in Chapter 8 of Alfred V. Aho, John E. 
Hopcroft, & Jeffrey D. Ullman, "The Design and Analysis of Computer 
Algorithms", Addison- Wesley, Reading, Massachusetts, 1974. 
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Example 

For example, to multiply the two integers A = 3141 and B = 5926 in radix R 
= 10, one starts with 

a(X) = 3X 3 + 2 + 4Y+ 1 and b{X) = 5X 3 + 9X 2 + 2X+ 6 . 

These polynomial coefficients come directly from the decimal expansions 
of A and 5. As in an earlier example, form the polynomial product 

a{X)b{X) = 15 X 6 -f 32X 5 + 35X 4 + 61X 3 + 23X 2 + 26X+ 6 

(nine multiplications of coefficients suffice). Substitute X= 10 to get 

AB = £i(10) 6(10) = 15000000 + 3200000 + 350000 

+ 61000 + 2300 + 260 + 6 
= 18613566. 

Because coefficients of the product a(X)b(X) may exceed R - 1, carry 
propagation is needed during the final phase. For example, the 6 from 61 in 6 \X 3 
is added to the 35 from 35X 4 , giving 41 — this 1 is part of the product and the 4 is 
added to the 32 from 32X 5 . 
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Finite Field Extensions 

Another application of polynomials occurs when taking extensions of a 
finite field. Let p be a prime. Set K = GF(p), the finite field with p elements. 
Choose an extension degree m > 0 and an irreducible polynomial F{X) of degree m 
over K. The extension GF(p m ) consists of all polynomials of degree at most m— 1, 
with coefficients in the finite ring K. 

To multiply two elements of GF(p m ), form their polynomial product (of 
degree at most 2m - 2) and reduce this product modulo F(X), aiming for (close to 
M(m)) multiplications in the base field K. The field polynomial F{X) is often 
chosen to make the reduction step easy (e.g., by having few nonzero coefficients). 

Arithmetic in GF(2 m ) 

Today's high-performance computers typically use binary arithmetic, in 
which each bit has exactly two possible values. Standard approaches for elliptic 
curve cryptosystems allow arithmetic modulo prime p or over a field GF(2 m ). For 
the purposes of clarity, two encodings for elements of GF(2 m ) {polynomial basis or 
normal basis) are allowed herein. A polynomial basis is assumed. 

It takes m bits to store an arbitrary element of GF(2 m ). On a 6-bit computer 
(where b is typically 32 or 64, but may take other values), these bits fit in n = 
CEIL(m/Z>) words. [CEIL(x) rounds its real argument x up to the next integer.] 
The polynomial basis encoding of a typical field element: 

«= Lo</<m-i tyX* 
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(each a;= 0 or 1) can store cto to o^-i in one 6-bit computer word, then to a 2b - X in 
another word, etc. The (high-order) (n -l)-th word uses only m - (n - X)b of its b 
bits — with the unused bits typically set to zero. 

For example, if b = 32 and m = 163, then n = CEIL(163/32) = 6 words 
suffice to hold an arbitrary element of GF(2 163 ) on a 32-bit machine. Five words 
hold 32 bits each. The sixth word holds 3 bits (coefficients of X 160 to X 162 ), with 
its other 29 bits unused. 

This encoding makes addition of two field elements very easy — it 
corresponds to n applications of the bitwise exclusive "OR" instruction found on 
most binary machines (one exclusive OR per word). 

Subtraction is the same as addition in this algebra: x + y = x - y for all jc, y 
e GF(2 m ). In particular, 1 + 1=0. 

The polynomial multiplication operates on polynomials with b bits per 
input coefficient. An earlier section described how to multiply two polynomials 
each with n x n 2 terms using M(«i) multiplications of polynomials of degree at most 
n 2 - 1, by breaking the input into n x blocks each of length « 2 - Apply that 
construction with n x — n and n 2 = b. Use a specialized method for multiplying two 
6-term polynomials (stored in 6-bit words) over GF(2), invoking that method M(n) 
times. Pad the original operands with nb - m leading zeros. As in the integer 
arithmetic section, carry propagation is needed on the outputs since the output 
coefficients have 2b - 1 bits each. 

Exemplary Computation System 

The one or more exemplary implementations, described herein, of the 
present claimed invention may be implemented (in whole or in part) by a 
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Karatsuba- variant calculation unit 130 and/or by a computing environment like 
that shown in Fig. 3. 

Although the exemplary Karatsuba-variant calculator, described herein, is 
valid in any characteristic, it is especially useful in characteristic 2 or 3, meaning 
algebras in which 1 + 1= 0 or 1 + 1 + 1= 0. For an application of characteristic 3 
fields to cryptography, see "Implementing the Tate Pairing" by Steven D. 
Galbraith et al in Algorithmic Number Theory, 5th International Symposium, 
ANTS V, Sydney, Australia, July, 2002, Springer-Verlag LNCS 2369, pp. 324- 
337. 

Fig. 1 shows an example of a computation system 100 that employs the 
Karatsuba-variant calculation unit 130. Such a system may be used to compute 
large integers and/or polynomials. It may also be used for multiplication 
computations in finite field extensions. 

The system includes an input unit 110 for receiving the input data to be 
calculated. It has a memory 120 and the Karatsuba-variant calculation unit 130. It 
also has an output unit 140 for communicating the results of such calculations. 

Pairwise Multiplication 

The calculations of the exemplary Karatsuba-variant calculator may be 
performed recursively. The product of two one-term (i.e., constant polynomials) 
is found using multiplication in over the ring in which the coefficients lie. But the 
more complicated (a\ X+ a 0 ) (b\X + b 0 ) requires three products, namely a 0 b 0 , (a x + 
a 0 ) (b { +b 0 ), and a\b { . 

These three products are done by invoking the algorithm recursively. As we do so, 
we append the three pairs of inputs (a 0 , b 0 ), (a x + a 0 > b x + b 0 ) , and (a u b x ) to a 
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queue of products we're waiting on. The subsequent processing of these pairs 
may insert additonal entries in the waiting list. Once all three products have been 
completed, the procedure which queued them can complete its task. 

Exemplary Karatsuba- Variant Calculator 

The Karatsuba- variant calculation unit 130 of the computation system 100 
employs an improved variant of the Karatsuba multiplication approach. More 
specifically, the Karatsuba- variant calculation unit 130 employs the exemplary 
Karatsuba- variant calculator, as described herein, which is generalized for any n- 
digit number or n-term polynomial, where n is a positive integer. 

The exemplary Karatsuba-variant calculator herein has one or more 
embodiments to multiply pairs of polynomials with six terms each i.e., pairs of 
polynomials of degree at most 5). It achieves M(6) < 17, compared with the M(6) 
< 18 in Table 1 above. These embodiments may work in arbitrary characteristic — 
all coefficients are integers, so there are no divisions. Like the original Karatsuba, 
it does not assume multiplication is commutative. These embodiments may be 
used in recursive constructions to achieve, for example, 

M{\ 1) < M(5) + 2M(6) < 15 + 2*17 = 49, 
M(12) < M(2)M(6) < 3*17 = 51, 
M(13) < M(6) + 2M(7) < 17 + 2*24 - 65, 
M(36)<M(6)M(6)< 17*17-289. 

All of these beat the M(n) < n{n+\)l2 bound. 
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Recursive use hereof yields M(6 k ) <= 17* for large k. This yields an 
asymptotic bound M(n) = 0(n c ) with c = log(17)/log(6) « 1.58125. This beat's the 
original Karatsuba exponent log(3)/log(2) « 1.58496. 

Six- term Karatsuba-variant Calculator using <17 Multiplications 

The following describes a particular embodiment of the Karatsuba-variant 
calculator. In particular, the embodiment described is one for two polynomials 
with six terms. Therefore, this may be called an exemplary 6-term Karatsuba- 
variant calculator. 

This approach reduces the asymptotic behavior from n x 585 to n 581 (where 
1.585 = ln(3)/ln(2) and 1.581 = ln(17)/ln(6)). 

Two polynomials in the variable X, each with degree at most 5 (i.e., 6-term 
polynomials in X) are described as follows: 



a(X) = a 0 + a x X+ a 2 X 2 + a^X 3 + a^X 4 * aJC s 



[3] 



and 



b(X) = b 0 + b x X+ b 2 X 2 + biX 3 + btX 4 + bsX 5 



[4] 



MS1-1245US 

lee @ ha yes <*c som^-mk 



21 



031904 1253 MS1- 1245US.PA LAPP. FINAL 
Atty: kasey Christie 



1 

2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 



Given that description of the 6-term polynomials and letting C be an 
arbitrary (polynomial) value, the equation for the exemplary 6-term Karatsuba- 
variant calculator is 

a(X) b(X) 

= (a 0 + a x + a 2 + a 3 + a 4 + a 5 ) (b 0 + b x + b 2 + b 3 + b 4 + b 5 ) C 
+ (a x + a 2 + a 4 + a 5 ) (b x + b 2 + b 4 + b 5 ) (-C+ X 6 ) 
+ (a 0 + a x + a 3 + a 4 ) (b 0 + b x + b 3 + b 4 ) (-C+ X 4 ) 
+ (a 0 -a 2 -a 3 + a 5 ) (b 0 - b 2 -b 3 + b 5 ) (C-X 7 +X 6 -X 5 +X 4 -X 3 ) 
+ (a 0 -a 2 - a 5 ) (b 0 - b 2 - b 5 ) (C~X 5 +X 4 -X 3 ) 
+ (a 0 + a 3 - a 5 ) (b 0 + b 3 - b 5 ) (C-X 7 + X 6 -X s ) 
+ (a 0 + a x + a 2 ) (b 0 + b t + b 2 ) {C~X 7 + X 6 -2X 5 + 2X 4 - 2X 3 +X 2 ) 
+ (a 3 + a 4 + a 5 ) (b 3 + b 4 + b 5 ) (C + X s - 2X 7 + 2X 6 - 2X 5 +X 4 -X 3 ) 
+ (a 2 + a 3 ) (b 2 + b 3 ) (-2C + X 7 - X 6 + IX s - X 4 + X 3 ) 
+ (a x -a 4 )(b x - b 4 ) (-C + X 4 - X s + X 6 ) 
+ (a x + a 2 ) (b x + b 2 ) (-C + X 7 - 2X 6 + 2X 5 - IX 4 + 3X 3 -X 1 ) 
+ (a 3 + a 4 ) (b 3 + b 4 ) (~C-X S + 3X 7 -2X 6 + 2X 5 - 2X 4 + X 3 ) 
+ (a 0 +a x ) (b 0 + b x ) (-C + X 7 -X 6 + IX s - 3X 4 + 2X 3 - X 1 +X) 
+ (a 4 + a 5 ) (b 4 + b 5 ) (-C + X 9 -X S + 2X 7 - 3X 6 + 2X 5 -X 4 + X 3 ) 
+ a 0 b 0 (-3C + 2X 7 -2X 6 + 3X 5 - 2X 4 + IX 3 -X+ 1) 
+ a x b x (3C-X 7 -X 5 + X 4 -3X 3 + 2X 2 -X) 
+ a 4 b 4 (3C-X 9 + 2X 8 - 3X 7 + X 6 -X s - X 3 ) 
+ a s b s (-3C + X 10 - X 9 + IX 7 - 2X 6 + 3X 5 -2X 4 + 2X 3 ). [5] 
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There are 18 products involving a's and 6's, but only 17 of them need be 
computed, by adapting the polynomial parameter C. For example, if C = 0 there 
is no need to compute: (a 0 + a x + a 2 + a 3 + a 4 + a 5 ) (b 0 + b x + b 2 + 63 + b 4 + b 5 ). 

Once C has been chosen so a multiplier vanishes, one can group the 
coefficients of each power of X, expressing each coefficient of the product as a 
linear combination of the 17 remaining products. 

Methodological Implementation of the Exemplary Karatsuba- Variant 
Calculator 

Fig. 2 shows methodological implementation of the exemplary Karatsuba- 
variant calculator performed by the Karatsuba- variant calculation unit 130 (or 
some portion thereof). This methodological implementation may be performed in 
software, hardware, or a combination thereof. This methodological 
implementation may be performed in software, hardware, or a combination 
thereof. For ease of understanding, the method steps are delineated as separate 
steps; however, these separately delineated steps should not be construed as 
necessarily order dependent in their performance. 

At 210 of Fig. 2, the exemplary Karatsuba- variant calculator obtains pairs 
of input polynomials with a maximum of 6 terms each. There may be only two 
polynomials or a pair from a collection of pairs. 

At 220, it selects one pair of input polynomials, where the two input 6-term 
polynomials are nominally labeled in accordance with Equations 3 and 4 above. 

If the polynomials have a degree other than 5, then they may be processed 
by other portions of the calculation unit 130 which are configured specifically for 
polynomials having that degree. Furthermore, polynomials having a degree 
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greater than 5 may be broken down into multiple polynomials where at least one 
of them has a degree of 5. 

At 230, the exemplary Karatsuba-variant calculator computes the product 
polynomial of these two input 6-term polynomials. It does so by using Equation 5 
above to calculate the product polynomial, after choosing C. 

At 240, it determines whether any pairs of polynomials remain unselected. 
If so, then it returns to block 220 to repeat the functions of blocks 220, 230, and 
240 for another pair of input polynomials. 

If none remain unselected, then it reports the results (product polynomial) 

at 250. 

Exemplary Computing System and Environment 

Fig. 3 illustrates an example of a suitable computing environment 300 
within which an exemplary Karatsuba-variant calculator, as described herein, may 
be implemented (either fully or partially). The computing environment 300 may 
be utilized in the computer and network architectures described herein. 

The exemplary computing environment 300 is only one example of a 
computing environment and is not intended to suggest any limitation as to the 
scope of use or functionality of the computer and network architectures. Neither 
should the computing environment 300 be interpreted as having any dependency 
or requirement relating to any one or combination of components illustrated in the 
exemplary computing environment 300. 

The exemplary Karatsuba-variant calculator may be implemented with 
numerous other general purpose or special purpose computing system 
environments or configurations. Examples of well known computing systems, 
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environments, and/or configurations that may be suitable for use include, but are 
not limited to, personal computers, server computers, thin clients, thick clients, 
hand-held or laptop devices, smart cards, multiprocessor systems, mobile phones, 
microprocessor-based systems, set-top boxes, smart cards, programmable 
consumer electronics, network PCs, minicomputers, mainframe computers, 
distributed computing environments that include any of the above systems or 
devices, and the like. 

The exemplary Karatsuba-variant calculator may be described in the 
general context of computer-executable instructions, such as program modules, 
being executed by a computer. Generally, program modules include routines, 
programs, objects, components, data structures, etc. that perform particular tasks 
or implement particular abstract data types. The exemplary Karatsuba-variant 
calculator may also be practiced in distributed computing environments where 
tasks are performed by remote processing devices that are linked through a 
communications network. In a distributed computing environment, program 
modules may be located in both local and remote computer storage media 
including memory storage devices. 

The computing environment 300 includes a general-purpose computing 
device in the form of a computer 302. The components of computer 302 may 
include, by are not limited to, one or more processors or processing units 304, a 
system memory 306, and a system bus 308 that couples various system 
components including the processor 304 to the system memory 306. 

The system bus 308 represents one or more of any of several types of bus 
structures, including a memory bus or memory controller, a peripheral bus, an 
accelerated graphics port, and a processor or local bus using any of a variety of 
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bus architectures. By way of example, such architectures may include an Industry 
Standard Architecture (ISA) bus, a Micro Channel Architecture (MCA) bus, an 
Enhanced ISA (EISA) bus, a Video Electronics Standards Association (VESA) 
local bus, and a Peripheral Component Interconnects (PCI) bus also known as a 
Mezzanine bus. 

Computer 302 typically includes a variety of computer readable media. 
Such media may be any available media that is accessible by computer 302 and 
includes both volatile and non-volatile media, removable and non-removable 
media. 

The system memory 306 includes computer readable media in the form of 
volatile memory, such as random access memory (RAM) 310, and/or non- volatile 
memory, such as read only memory (ROM) 312. A basic input/output system 
(BIOS) 314, containing the basic routines that help to transfer information 
between elements within computer 302, such as during start-up, is stored in ROM 
312. RAM 310 typically contains data and/or program modules that are 
immediately accessible to and/or presently operated on by the processing unit 304. 

Computer 302 may also include other removable/non-removable, 
volatile/non- volatile computer storage media. By way of example, Fig. 3 
illustrates a hard disk drive 316 for reading from and writing to a non-removable, 
non-volatile magnetic media (not shown), a magnetic disk drive 318 for reading 
from and writing to a removable, non-volatile magnetic disk 320 (e.g., a "floppy 
disk"), and an optical disk drive 322 for reading from and/or writing to a 
removable, non-volatile optical disk 324 such as a CD-ROM, DVD-ROM, or other 
optical media. The hard disk drive 316, magnetic disk drive 318, and optical disk 
drive 322 are each connected to the system bus 308 by one or more data media 
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interfaces 326. Alternatively, the hard disk drive 316, magnetic disk drive 318, 
and optical disk drive 322 may be connected to the system bus 308 by one or more 
interfaces (not shown). 

The disk drives and their associated computer-readable media provide non- 
volatile storage of computer readable instructions, data structures, program 
modules, and other data for computer 302. Although the example illustrates a hard 
disk 31 6, a removable magnetic disk 320, and a removable optical disk 324, it is to 
be appreciated that other types of computer readable media which may store data 
that is accessible by a computer, such as magnetic cassettes or other magnetic 
storage devices, flash memory cards, CD-ROM, digital versatile disks (DVD) or 
other optical storage, random access memories (RAM), read only memories 
(ROM), electrically erasable programmable read-only memory (EEPROM), and 
the like, may also be utilized to implement the exemplary computing system and 
environment. 

Any number of program modules may be stored on the hard disk 316, 
magnetic disk 320, optical disk 324, ROM 312, and/or RAM 310, including by 
way of example, an operating system 326, one or more application programs 328, 
other program modules 330, and program data 332. 

A user may enter commands and information into computer 302 via input 
devices such as a keyboard 334 and a pointing device 336 (e.g., a "mouse"). 
Other input devices 338 (not shown specifically) may include a microphone, 
joystick, game pad, satellite dish, serial port, scanner, and/or the like. These and 
other input devices are connected to the processing unit 304 via input/output 
interfaces 340 that are coupled to the system bus 308, but may be connected by 
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other interface and bus structures, such as a parallel port, game port, or a universal 
serial bus (USB). 

A monitor 342 or other type of display device may also be connected to the 
system bus 308 via an interface, such as a video adapter 344. In addition to the 
monitor 342, other output peripheral devices may include components such as 
speakers (not shown) and a printer 346 which may be connected to computer 302 
via the input/output interfaces 340. 

Computer 302 may operate in a networked environment using logical 
connections to one or more remote computers, such as a remote computing device 
348. By way of example, the remote computing device 348 may be a personal 
computer, portable computer, a server, a router, a network computer, a peer device 
or other common network node, and the like. The remote computing device 348 is 
illustrated as a portable computer that may include many or all of the elements and 
features described herein relative to computer 302. 

Logical connections between computer 302 and the remote computer 348 
are depicted as a local area network (LAN) 350 and a general wide area network 
(WAN) 352. Such networking environments are commonplace in offices, 
enterprise-wide computer networks, intranets, and the Internet. 

When implemented in a LAN networking environment, the computer 302 is 
connected to a local network 350 via a network interface or adapter 354. When 
implemented in a WAN networking environment, the computer 302 typically 
includes a modem 356 or other means for establishing communications over the 
wide network 352. The modem 356, which may be internal or external to 
computer 302, may be connected to the system bus 308 via the input/output 
interfaces 340 or other appropriate mechanisms. It is to be appreciated that the 
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illustrated network connections are exemplary and that other means of establishing 
communication link(s) between the computers 302 and 348 may be employed. 

In a networked environment, such as that illustrated with computing 
environment 300, program modules depicted relative to the computer 302, or 
portions thereof, may be stored in a remote memory storage device. By way of 
example, remote application programs 358 reside on a memory device of remote 
computer 348. For purposes of illustration, application programs and other 
executable program components such as the operating system are illustrated herein 
as discrete blocks, although it is recognized that such programs and components 
reside at various times in different storage components of the computing device 
302, and are executed by the data processor(s) of the computer. 

Computer-Executable Instructions 

An implementation of an exemplary Karatsuba-variant calculator may be 
described in the general context of computer-executable instructions, such as 
program modules, executed by one or more computers or other devices. 
Generally, program modules include routines, programs, objects, components, data 
structures, etc. that perform particular tasks or implement particular abstract data 
types. Typically, the functionality of the program modules may be combined or 
distributed as desired in various embodiments. 

Exemplary Operating Environment 

Fig. 3 illustrates an example of a suitable operating environment 300 in 
which an exemplary Karatsuba-variant calculator may be implemented. 
Specifically, the exemplary Karatsuba-variant calculator(s) described herein may 
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be implemented (wholly or in part) by any program modules 328-330 and/or 
operating system 326 in Fig. 3 or a portion thereof. 

The operating environment is only an example of a suitable operating 
environment and is not intended to suggest any limitation as to the scope or use of 
functionality of the exemplary Karatsuba-variant calculators) described herein. 
Other well known computing systems, environments, and/or configurations that 
are suitable for use include, but are not limited to, personal computers (PCs), 
server computers, hand-held or laptop devices, multiprocessor systems, 
microprocessor-based systems, smart cards, programmable consumer electronics, 
wireless phones and equipments, general- and special-purpose appliances, 
application-specific integrated circuits (ASICs), network PCs, minicomputers, 
mainframe computers, distributed computing environments that include any of the 
above systems or devices, and the like. 

Computer Readable Media 

An implementation of an exemplary Karatsuba-variant calculator may be 
stored on or transmitted across some form of computer readable media. Computer 
readable media may be any available media that may be accessed by a computer. 
By way of example, and not limitation, computer readable media may comprise 
"computer storage media" and "communications media." 

"Computer storage media" include volatile and non-volatile, removable and 
non-removable media implemented in any method or technology for storage of 
information such as computer readable instructions, data structures, program 
modules, or other data. Computer storage media includes, but is not limited to, 
RAM, ROM, EEPROM, smart cards, flash memory or other memory technology, 
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CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic 
cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, 
or any other medium which may be used to store the desired information and 
which may be accessed by a computer. 

"Communication media" typically embodies computer readable 
instructions, data structures, program modules, or other data in a modulated data 
signal, such as carrier wave or other transport mechanism. Communication media 
also includes any information delivery media. 

The term "modulated data signal" means a signal that has one or more of its 
characteristics set or changed in such a manner as to encode information in the 
signal. By way of example, and not limitation, communication media includes 
wired media such as a wired network or direct-wired connection, and wireless 
media such as acoustic, RF, infrared, and other wireless media. Combinations of 
any of the above are also included within the scope of computer readable media. 

Conclusion 

Although the invention has been described in language specific to structural 
features and/or methodological steps, it is to be understood that the invention 
defined in the appended claims is not necessarily limited to the specific features or 
steps described. Rather, the specific features and steps are disclosed as preferred 
forms of implementing the claimed invention. 
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